pVecSearch2024-03-22 look back
claude.iconA summary of the pVectorSearch project was created.
This Scrapbox-centric project on [vector search
Development Progress
Initial thoughts on this Scrapbox [vector search
First, envisioned creating an API (as a tool for myself)
Build qdrant on local Docker and experiment with various things. Applying the knowledge of [Scrapbox ChatGPT Connector
Search and display hit texts and pages
Consider [parallelizing embedded API calls
Implement administrative functions (save search queries, generate permalinks)
Be able to search across multiple people's Scrapboxes
Consider [making non-public materials eligible for vector searches
Consider the unit of information for [Cross-sectional Vector Search
Introduced to the forum for [Democratic Inputs to AI
Overview: (in Japanese only)
System to automatically vectorize and index Scrapbox content
Additional data collected from Notion
Automatically run at 6am daily using Github Actions
Split into 500 tokens each and vectorize with OpenAI's Embedding API
Uploaded to Qdrant database for vector searching
Now run daily.
I started writing reports in Scrapbox.
2023/8/9
Organized the code to make it easier to put into other projects.
2023-09-22 omni writing report in Scrapbox moved to private project
Provides vector searches for the contents of [The Plurality Book
Data sources are manuscripts on Github and their machine translation
Additional RadicalxChange Blog articles were also collected.
Adding a target from Markdown
2024-04-02
New Feature Proposal
Pass the Scrapbox page name in the URL fragment with GET
Allows "load it and prompt execution" from search results
Ability to switch search targets
Option to exclude hits on the same page
Challenges and Future
Implementation of update function
Stable operation of services (log management, error handling, etc.)
Careful handling of search results containing non-public sources
This is an overview of the pVectorSearch project to date. The project is underway to explore the possibility of using vector search technology to cross-search individual knowledge bases for collaboration and consensus building. The challenge of providing new options for intellectual production and discussion is likely to continue, with updates and improvements.
nishio.iconKeep it short and sweet.
claude.iconThe following is a brief summary of VectorSearch's development progress.
April 29~May 31, 2023: Initial thoughts on vector search and API concept
June 2~5, 2023: Experiments in local environment, application of Scrapbox ChatGPT Connector findings
June 6, 2023: Administration functionality implemented, "Nishio Vector Search" released.
June 7, 2023: Crawl through other people's Scrapboxes and search across multiple people's Scrapboxes
June 13-15, 2023: Study of making non-public materials searchable and cross-search of library holdings
June 27, 2023: Omoikane Embed development, introduced to the Democratic Inputs to AI forum (omoikane)
August 9, 2023: Organize code and make it easier to implement in other projects
October 17, 2023: Release of the Plurality Vector Search against the contents of The Plurality Book.
Omoikane Embed is a system that automatically vectorizes content from Scrapbox and other sources and uploads it to the Qdrant database to provide vector searches. it is being upgraded in June and July 2023, respectively, to expand its capabilities.
---
This page is auto-translated from /nishio/pVecSearch2024-03-22振り返り using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.